Chinese Word Segmentation in FTRD Beijing

نویسندگان

  • Heng Li
  • Yuan Dong
  • Xinnian Mao
  • Haila Wang
  • Wu Liu
چکیده

This paper presents a word segmentation system in France Telecom R&D Beijing, which uses a unified approach to word breaking and OOV identification. The output can be customized to meet different segmentation standards through the application of an ordered list of transformation. The system participated in all the tracks of the segmentation bakeoff -PK-open, PKclosed, AS-open, AS-closed, HK-open, HK-closed, MSR-open and MSRclosed -and achieved the state-of-theart performance in MSR-open, MSRclose and PK-open tracks. Analysis of the results shows that each component of the system contributed to the scores.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporating New Words Detection with Chinese Word Segmentation

With development in Chinese words segmentation, in-vocabulary word segmentation and named entity recognition achieves state-of-art performance. However, new words become bottleneck to Chinese word segmentation. This paper presents the result from Beijing Institute of Technology (BIT) in the Sixth International Chinese Word Segmentation Bakeoff in 2010. Firstly, the author reviewed the problem c...

متن کامل

Identification of Chinese Personal Names in Unrestricted Texts

Automatic identification of Chinese personal names in unrestricted texts is a key task in Chinese word segmentation, and can affect other NLP tasks such as word segmentation and information retrieval, if it is not properly addressed. This paper (1) demonstrates the problems of Chinese personal name identification in some IT applications, (2) analyzes the structure of Chinese personal names, and...

متن کامل

Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff

In this paper, we roughly described the procedures of our segmentation system, including the methods for resolving segmentation ambiguities and identifying unknown words. The CKIP group of Academia Sinica participated in testing on open and closed tracks of Beijing University (PK) and Hong Kong Cityu (HK). The evaluation results show our system performs very well in either HK open track or HK c...

متن کامل

A Hybrid Approach to Chinese Word Segmentation around CRFs

In this paper, we present a Chinese word segmentation system which is consisted of four components, i.e. basic segmentation, named entity recognition, error-driven learner and new word detector. The basic segmentation and named entity recognition, implemented based on conditional random fields, are used to generate initial segmentation results. The other two components are used to refine the re...

متن کامل

Design of CKIP Chinese Word Segmentation System

In this paper, we describe the design of the CKIP Chinese word segmentation system and analyse its performance. The system utilizes a modulized approach. Independent modules were designed to solve the problems of segmentation ambiguities and identifying unknown words. Segmentation ambiguities are resolved by a hybrid method of using heuristic and statistical rules. Regular-type unknown words ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005